feat: add import profiler tool with dynamic loaded lines code volume …#17467
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a process-isolated Python SDK Import Profiler tool (profiler.py) along with its comprehensive documentation (documentation.md). The profiler measures initialization latency, peak memory usage, and dynamic code volume across multiple cold-start iterations. Feedback on the implementation highlights several key improvements: adding a guard to prevent crashes during single-iteration runs, replacing absolute local file paths in the documentation with relative links, handling .pyc files and logging exceptions during line counting, and exposing the worker process's stderr to facilitate debugging when subprocesses fail.
- Add fallback to handle single-iteration quantile calculation crashes - Normalize .pyc file paths during line counting and log exceptions - Expose worker process stderr to facilitate debugging CalledProcessError - Fix absolute paths in documentation.md to use relative directory paths
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK import profiler tool along with its documentation. The profiler is designed to run process-isolated benchmarks to measure import latency, peak memory usage, and dynamic code volume. The feedback provided aims to improve the robustness of the tool. Key recommendations include using importlib.util.source_from_cache to correctly resolve source paths from compiled .pyc files, moving imports to the module level, parsing only the last line of subprocess stdout to handle noisy module initializations safely, and explicitly specifying encoding="utf-8" when writing files to ensure platform-independent behavior.
| import importlib | ||
| import csv | ||
| import os |
…g, and encodings - Use importlib.util.source_from_cache for robust .pyc resolution - Move importlib.util and logging imports to module level - Refine json.loads to parse only the last line of stdout - Specify UTF-8 encoding when opening files for writing
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK import profiling tool (profiler.py) along with its comprehensive documentation (documentation.md). The profiler uses a process-isolated master/worker architecture to accurately measure cold-start import latency, peak memory usage, and dynamic code volume. The review feedback highlights two key areas for improvement: robustly parsing the worker's stdout in run_master to prevent crashes from unexpected output (such as warnings) and validating the iteration count, and checking the subprocess return code in run_trace to warn users if an import fails instead of silently writing the error log.
| def run_trace(target_module): | ||
| """Generates importtime trace log and writes it to a file.""" | ||
| trace_file = f"import_trace_{target_module.replace('.', '_')}.log" | ||
| print(f"Generating importtime trace log for {target_module} -> {trace_file}...") | ||
|
|
||
| # We run: python -X importtime -c "import <module>" | ||
| result = subprocess.run( | ||
| [sys.executable, "-X", "importtime", "-c", f"import {target_module}"], | ||
| capture_output=True, text=True | ||
| ) | ||
|
|
||
| with open(trace_file, "w", encoding="utf-8") as f: | ||
| f.write(result.stderr) | ||
|
|
||
| print(f"Trace log successfully written to {trace_file}") |
There was a problem hiding this comment.
In run_trace, if the import fails, subprocess.run will complete with a non-zero exit code, but the script will silently write the stderr to the trace file and print a success message without warning the user that the import actually failed. Adding a check for result.returncode and printing a warning with the worker's stdout/stderr will significantly improve usability and prevent users from analyzing incomplete or failed traces.
def run_trace(target_module):
"""Generates importtime trace log and writes it to a file."""
trace_file = f"import_trace_{target_module.replace('.', '_')}.log"
print(f"Generating importtime trace log for {target_module} -> {trace_file}...")
# We run: python -X importtime -c "import <module>"
result = subprocess.run(
[sys.executable, "-X", "importtime", "-c", f"import {target_module}"],
capture_output=True, text=True
)
if result.returncode != 0:
print(f"WARNING: Import failed with exit code {result.returncode}. The trace log may be incomplete or contain errors.", file=sys.stderr)
if result.stdout:
print(f"Worker stdout:\\n{result.stdout}", file=sys.stderr)
if result.stderr:
print(f"Worker stderr:\\n{result.stderr}", file=sys.stderr)
with open(trace_file, "w", encoding="utf-8") as f:
f.write(result.stderr)
print(f"Trace log successfully written to {trace_file}")- Extract worker logic to _run_worker_and_parse for robust JSON parsing - Add early validation for iterations parameter in run_master - Add non-zero exit code check and warning for run_trace failures
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK Import Profiler, which includes a comprehensive documentation guide and a self-spawning profiling script (profiler.py) designed to benchmark import latency, peak memory, and dynamic code volume using a process-isolated master/worker architecture. The feedback recommends enhancing the robustness of the worker's stdout parsing by prefixing metrics with a unique identifier to prevent interference from module initialization logs, and replacing the manual CLI argument parsing with the standard argparse library for improved usability and maintainability.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK Import Profiler tool, consisting of a detailed documentation guide and a self-spawning, process-isolated benchmarking script (profiler.py) designed to measure cold-start import latency, peak memory usage, and dynamic code volume. The reviewer's feedback focuses on improving the robustness and usability of the script. Key recommendations include converting the cpu parameter to a string to prevent potential runtime errors, automatically creating parent directories for the CSV export path to avoid FileNotFoundError, and replacing manual CLI argument parsing with Python's standard argparse module for a more robust and user-friendly interface.
I am having trouble creating individual review comments. Click here to see my feedback.
scripts/import_profiler/profiler.py (78-81)
If cpu is passed as an integer (e.g., cpu=0), calling cpu.lower() will raise an AttributeError. Additionally, passing an integer to the subprocess command list will cause a TypeError during execution. Converting cpu to a string at the beginning of run_master ensures robust handling of both integer and string inputs.
def run_master(iterations, target_module, cpu="0", csv_path=None):
cpu = str(cpu) if cpu is not None else "none"
if iterations < 1:
raise ValueError("Number of iterations must be at least 1.")
scripts/import_profiler/profiler.py (127-128)
If the provided csv_path contains nested directories that do not exist yet, open(csv_path, "w") will raise a FileNotFoundError. Automatically creating the parent directories using os.makedirs makes the tool more robust and utilizes the otherwise unused os import.
if csv_path:
os.makedirs(os.path.dirname(os.path.abspath(csv_path)), exist_ok=True)
with open(csv_path, "w", newline="", encoding="utf-8") as f:
scripts/import_profiler/profiler.py (232-267)
Using manual argument parsing is error-prone, doesn't support standard --help / -h flags (which currently falls back to running the full benchmark), and only supports specific formats like --module=value instead of --module value. Replacing this with Python's standard argparse module provides robust validation, automatic help documentation, and a much better user experience.
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(
description="Python SDK Import Profiler: benchmark and profile module imports."
)
parser.add_argument(
"--module",
default="google.cloud.compute",
help="Target module to profile (e.g., google.cloud.aiplatform)"
)
parser.add_argument(
"--iterations",
type=int,
default=50,
help="Number of cold-start iterations to run"
)
parser.add_argument(
"--cpu",
default="0",
help="CPU core to pin processes to using taskset (set to 'none' to disable)"
)
parser.add_argument(
"--csv",
help="Path to export raw iteration metrics as a CSV file"
)
parser.add_argument(
"--trace",
action="store_true",
help="Generate an importtime trace log file"
)
parser.add_argument(
"--cprofile",
action="store_true",
help="Run cProfile to capture latency stack traces"
)
parser.add_argument(
"--mprofile",
action="store_true",
help="Run tracemalloc memory snapshot"
)
parser.add_argument(
"--worker",
action="store_true",
help="Run in worker mode (internal use)"
)
args = parser.parse_args()
if args.worker:
run_worker(args.module)
elif args.trace:
run_trace(args.module)
elif args.cprofile:
run_cprofile(args.module)
elif args.mprofile:
run_mprofile(args.module)
else:
run_master(args.iterations, args.module, args.cpu, args.csv)|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK Import Profiler tool, consisting of a master/worker profiling script (profiler.py) and its comprehensive documentation (documentation.md). The profiler measures import latency, peak memory, and dynamic code volume in a process-isolated manner to avoid caching issues. The feedback suggests a platform-aware improvement to default the CPU pinning option to 'none' on non-Linux systems, preventing unnecessary warnings and fallback attempts on macOS or Windows.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK Import Profiler tool, consisting of a comprehensive documentation guide and a master/worker execution script (profiler.py) designed to benchmark module import performance under cold-start conditions. Feedback on the implementation highlights opportunities to preserve the cold-start design by running both cProfile and tracemalloc profiling in isolated subprocesses rather than directly within the master process. Additionally, it is recommended to separate FileNotFoundError and subprocess.CalledProcessError handling in the master process to avoid misleading warnings about taskset availability when a worker process crashes.
| def run_cprofile(target_module): | ||
| """Runs cProfile to capture stack traces for latency.""" | ||
| import cProfile | ||
| import pstats | ||
|
|
||
| prof_file = f"cprofile_{target_module.replace('.', '_')}.prof" | ||
| print(f"Generating cProfile data for {target_module} -> {prof_file}...") | ||
|
|
||
| # Run profiling | ||
| pr = cProfile.Profile() | ||
| pr.enable() | ||
| importlib.import_module(target_module) | ||
| pr.disable() | ||
|
|
||
| # Save for flame charts (e.g. via snakeviz) | ||
| pr.dump_stats(prof_file) | ||
| print(f"cProfile stats successfully written to {prof_file}") | ||
|
|
||
| # Print top bottlenecks | ||
| print("\n--- Top 15 functions by cumulative time ---") | ||
| ps = pstats.Stats(pr).sort_stats(pstats.SortKey.CUMULATIVE) | ||
| ps.print_stats(15) |
There was a problem hiding this comment.
Running cProfile directly in the master process defeats the "cold start" design of the profiler. Since profiler.py already imports heavy standard library modules (like subprocess, json, csv, logging, tracemalloc, statistics), any imports of these modules by the target module will be resolved instantly from Python's sys.modules cache and won't be captured in the profile.
To ensure a true cold-start profile, cProfile should be executed in an isolated subprocess.
def run_cprofile(target_module):
"""Runs cProfile in a clean subprocess to capture stack traces for latency."""
import pstats
prof_file = f"cprofile_{target_module.replace('.', '_')}.prof"
print(f"Generating cProfile data for {target_module} -> {prof_file}...")
# Run profiling in a clean subprocess to ensure cold-start
result = subprocess.run(
[sys.executable, "-m", "cProfile", "-o", prof_file, "-c", f"import importlib; importlib.import_module('{target_module}')"],
capture_output=True, text=True
)
if result.returncode != 0:
print(f"Error generating cProfile data:\n{result.stderr}", file=sys.stderr)
return
print(f"cProfile stats successfully written to {prof_file}")
# Print top bottlenecks
print("\n--- Top 15 functions by cumulative time ---")
ps = pstats.Stats(prof_file).sort_stats(pstats.SortKey.CUMULATIVE)
ps.print_stats(15)- Execute cProfile in a fresh subprocess to ensure cold-start accuracy - Execute tracemalloc in a fresh subprocess to capture all memory allocations - Clean up master process exception handling for missing taskset vs crashed workers
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a Python SDK Import Profiler tool, which includes a comprehensive documentation file (documentation.md) and a core profiling script (profiler.py) designed to measure import latency, peak memory, and dynamic code volume using process-isolated worker runs. The review feedback highlights several critical security and robustness improvements: addressing potential code injection vulnerabilities in run_trace, run_cprofile, and run_mprofile by safely escaping the target_module parameter using json.dumps(), and preventing potential UnicodeDecodeError failures in run_worker when reading source files by adding errors='ignore' to the file-opening logic.
| "import tracemalloc\n" | ||
| "import importlib\n" | ||
| "tracemalloc.start()\n" | ||
| f"importlib.import_module('{target_module}')\n" | ||
| "snapshot = tracemalloc.take_snapshot()\n" | ||
| "tracemalloc.stop()\n" | ||
| "top_stats = snapshot.statistics('lineno')\n" | ||
| "for stat in top_stats[:15]:\n" | ||
| " print(stat)\n" | ||
| ) | ||
| result = subprocess.run([sys.executable, "-c", code], capture_output=True, text=True) |
There was a problem hiding this comment.
Directly interpolating target_module into the code string is vulnerable to code injection. Use json.dumps() to safely escape the module name.
code = (
"import tracemalloc\n"
"import importlib\n"
"tracemalloc.start()\n"
f"importlib.import_module({json.dumps(target_module)})\n"
"snapshot = tracemalloc.take_snapshot()\n"
"tracemalloc.stop()\n"
"top_stats = snapshot.statistics('lineno')\n"
"for stat in top_stats[:15]:\n"
" print(stat)\n"
)Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)
|
/gemini review |
| * **[plan.md](./plan.md)**: The current project phases and roadmap checklist. | ||
| * **[status.md](./status.md)**: Tracks the active task state and hosts recorded baseline performance metrics. |
- Recursively wipes __pycache__ and .pyc files via pathlib prior to iteration loops. - Employs Python's -B flag on worker subprocesses to prevent re-caching during the benchmark, enforcing raw .py parsing for every execution.
- Replaces --clear-pycache with an opt-out --keep-pycache flag - The profiler now automatically sweeps and enforces disk-level cold starts without requiring explicit flags from the user.
Address PR review comment by moving setup tracking (rss_before, modules_before) outside of the time.perf_counter() window to prevent artificial latency inflation.
Addresses PR feedback to avoid broad exception catching when parsing physical python files for line counts.
Addresses PR review by explicitly documenting the expected PEP 3147/488 error path for irregular .pyc files.
…ssing modules Addresses PR review by replacing null-checks with a cleaner try/except block that assumes normal module behavior and explicitly logs when a C-extension or built-in is skipped due to a missing __file__ attribute.
Addresses PR review by replacing the 'none' string check with a typed NO_CPU_PINNING = -1 constant for cleaner argument parsing and state tracking.
…askset Addresses PR review by removing the fallback unpinned execution logic when taskset is not installed. The script now fails immediately with a clear error message, preventing code duplication.
Addresses PR review by collapsing the repeated statistics.quantiles logic into a reusable _calculate_percentiles helper method.
…terations Addresses PR review by validating that 'loaded_modules' and 'loaded_lines' remain deterministic across cold-start iterations, warning the user if non-deterministic import paths are triggered.
… tracemalloc profiling Addresses PR review by replacing the hard-to-maintain dynamic code string inside run_mprofile with a properly structured multiprocessing 'spawn' context.
| "import tracemalloc\n" | ||
| "import importlib\n" | ||
| "tracemalloc.start()\n" | ||
| f"importlib.import_module({json.dumps(target_module)})\n" |
There was a problem hiding this comment.
This still feels a little risky, since it could result in arbitrary code execution, depending on how the environment is set up
How many packages do you plan to test against? If it's a small number, can we hard-code a whitelist? If you wanted to run against all our libraries, Gemini suggested this code to dynamically build one from all installed google.cloud packages (we'd probably have to modify it a bit for non-cloud libraries too though)
import pkgutil
import google.cloud
# Dynamically find all installed legitimate google.cloud submodules
VALID_GCP_MODULES = {
f"google.cloud.{name}"
for _, name, _ in pkgutil.iter_modules(google.cloud.__path__)
}
def generate_benchmark_code(target_module):
if target_module not in VALID_GCP_MODULES:
raise ValueError("Invalid or uninstalled module requested.")
## Overview
This is an initiative to resolve severe initialization bottlenecks
(~10s-13s) in generated client libraries by adopting **Native Python
3.15 Explicit Lazy Imports (PEP 0810)**.
To avoid introducing maintenance burdens or subtle breaking changes via
custom import hooks, this PR focuses *exclusively* on utilizing native
interpreter standards.
### Implementation Architecture
We modify the upstream GAPIC Generator to emit a native
`__lazy_modules__` set at the top of the `__init__.py` files,
immediately followed by the standard eager imports.
* **Python 3.15+ (Native Acceleration):** The Python 3.15 runtime
natively intercepts the standard imports referenced in the
`__lazy_modules__` set and treats them as fast C-level lazy proxies.
Startup times drop from ~13s to ~200ms, and peak RAM consumption drops
by up to 85%.
* **Python 3.14 and Older (Zero Blast Radius):** Older Python
environments safely ignore the `__lazy_modules__` variable and process
the standard eager imports exactly as they do today. This guarantees
100% backwards compatibility with zero risk.
* **Perfect Static Typing:** Because standard imports are still present
in the file, IDEs (IntelliSense) and static analyzers (MyPy) maintain
zero-friction support.
**Example Structure:**
```python
# 1. Native C-Level Lazy Proxying for Python 3.15+ (PEP 0810)
__lazy_modules__ = {
f"{__name__}.services.accelerator_types",
f"{__name__}.services.addresses",
f"{__name__}.types.compute",
}
# 2. Standard eager imports
# Evaluated instantly by older Python versions. Intercepted natively by Python 3.15.
from .services.accelerator_types import AcceleratorTypesClient
from .services.addresses import AddressesClient
from .types.compute import Address
__all__ = (
"AcceleratorTypesClient",
"AddressesClient",
"Address",
)
```
## 🚀 Performance Benchmarking Results (Disk-Level Cold Start)
We utilized our custom `profiler.py` script #17467 to run 5 iterations
measuring true disk-level cold starts (with `__pycache__` sweeps) on the
`google-cloud-compute` library. The results validate this Phase 1 PEP
810 Native Lazy Loading implementation:
### Python 3.14 (Current Eager Import)
* **Time (Median):** `23,246.97 ms` (~23.2 seconds)
* **Memory (Physical RSS):** `224.87 MB`
* **Code Volume:** Loaded `1,407` modules (`1,040,992` lines of code)
### Python 3.15.0b2 (Phase 1 PEP 810 Lazy Proxy)
* **Time (Median):** `1,062.87 ms` (~1.06 seconds)
* **Memory (Physical RSS):** `7.21 MB`
* **Code Volume:** Loaded `15` modules (`8,064` lines of code)
**Impact:** By natively deferring the 120+ submodules at the C-level, we
bypassed parsing over **1 million lines of code**. This resulted in an
**approximately 22x reduction in startup latency (~95% speedup)**, and a
staggering **96% reduction in peak RAM consumption** (dropping from
~225MB down to just ~7MB).
Related Links:
see #17594 for a demonstration of this generator change applied to
google-cloud-compute package.
Design Doc: go/sdk-performance-design
Towards googleapis/python-aiplatform#4749
…ecution Addresses PR review feedback regarding the string-based dynamic code execution in run_trace and run_cprofile. By ensuring the --module argument strictly consists of valid Python identifiers via argparse type validation, we prevent any possibility of arbitrary code injection without requiring a hardcoded whitelist or polluting the cold-start cache.
| if cpu != NO_CPU_PINNING: | ||
| print(f"CPU Pinning enabled: Pinning processes to core {cpu} using taskset.") | ||
| else: | ||
| print("CPU Pinning disabled.") |
There was a problem hiding this comment.
Running taskset is Linux-specific. If CPU pinning is explicitly requested on macOS or Windows, the script may throw a FileNotFoundError and crashes.
Fallback gracefully to NO_CPU_PINNING on non-Linux platforms.
| rss_memories, p50_rss, p90_rss, p99_rss): | ||
| """Helper method to format and print the final benchmark results.""" | ||
| def _format_stats(title, data, p50, p90, p99, fmt): | ||
| if not data: return "" |
There was a problem hiding this comment.
Python's style guide discourages placing the body of a compound statement on the same line.
| except KeyError: | ||
| logging.debug(f"Module {m} disappeared from sys.modules during execution.") | ||
| except AttributeError: | ||
| logging.debug(f"Module {m} has no __file__ attribute (likely a C-extension or built-in). Skipping.") |
There was a problem hiding this comment.
The script uses logging.debug() but logging is never configured (via logging.basicConfig()).
Depending on how simple you want to keep the codebase, here are two options:
- If these messages are only meant for developers reading the code to understand why a module is skipped, replace the dead
logging.debugcalls with regular inline comments. - If you want developers to be able to toggle these logs on from the command line:
- Add the
--verboseflag toargparse. - Configure logging at startup and pass it to the worker command construction.
- Add the
| if not args.keep_pycache: clean_bytecode() | ||
| run_mprofile(args.module) | ||
| else: | ||
| run_master(args.iterations, args.module, args.cpu, args.csv, not args.keep_pycache) No newline at end of file |
There was a problem hiding this comment.
nit: add newline at the end.
- Optimized `clean_bytecode` to use `os.walk` and skip hidden directories (e.g. .git, .venv), preventing slow scans and unwanted cache eviction in developer venvs. - Added graceful fallback to disable CPU pinning on non-Linux platforms instead of crashing. - Ensured `clean_bytecode()` runs consistently for `--trace` and `--cprofile` modes for true cold-starts. - Replaced dead `logging.debug` calls with inline comments to keep the script dependency-free. - Fixed Python style guideline infractions and added a trailing newline.
| with open(file_path, 'r', encoding='utf-8', errors='ignore') as f: | ||
| loaded_lines += sum(1 for _ in f) | ||
| except OSError as e: | ||
| logging.warning(f"Failed to read lines from {file_path}: {e}") |
There was a problem hiding this comment.
I see this reference but it seems like you've removed the logging module.
Resolves a potential NameError crash caused by a leftover `logging` reference after the module was removed in the previous commit.
Removes the pathlib import which became obsolete after the clean_bytecode logic was refactored to use os.walk.
Overview
This PR introduces
profiler.py, a zero-dependency, process-isolated import profiling tool designed to benchmark Python SDK modules with extreme precision and zero dynamic cache interference.To evaluate authentic import initialization overhead without the "observer effect", the profiler leverages a Master/Worker architecture. The master orchestrator executes each benchmark iteration inside a completely isolated, dynamically spawned subprocess. By bypassing Python's
sys.modulescache and sweeping disk-level__pycache__directories, the tool guarantees a 100% authentic "cold-start" evaluation every time.Key Telemetry Vectors & Features
tracemalloc(cross-platform), and pure OS-level physical memory footprints via/proc/self/statmdeltas (Linux RSS) to catch heavy C/C++ backend extensions likeprotobuf.sys.modulesacross the import window, traversing disk paths to aggregate the exact number of physical Lines of Code (LOC) injected into the interpreter.taskseton Linux to eliminate OS-level context-switching jitter across iterations.--trace(-X importtime) and--cprofile.A comprehensive breakdown of the architecture, data pipeline, and trace log analysis is documented in
scripts/import_profiler/documentation.md.Related Links
[go/sdk-performance-design](http://goto.google.com/sdk-performance-design)